Abstract: At the moment, of these days, firms require to process many (PB) Petabyte Datasets shrewd. The information may not have dour blueprint (or schema) for the enormous system. It has grown into plush to shape the loyalty in each Application for processing (PB) petabyte of datasets. If there is a dispute of Nodes, blunders regularly, some of the reasons of defeat may be. Defeat is familiar, alternatively exceptional. The whole number of nodes in a cluster is not consistent. So there is must need for most natural infrastructure to have influentially, decent, *Open Source Apache License*.The Hadoop platform was planned to figure out troubles or problems where you have a bunch of data maybe a mixture of composite and structured information and it doesn’t match nicely into tables. It’s for positions where you desire to go analytics that are deep and computationally pervasive, similar to targeting and clustering. That’s precisely what *Google* was performing when it was indexing the web (www) and analysing user behaviour to upgrade the performance of the algorithms. This report has made an effort to analyze its need, purpose and application, therefore brought to the notice of the readers.

Keywords: Petabyte (PB), Name node, Map Reduce, HDFS.